1 Overview

Social determinants of health (SDOH), such as inadequate housing, lack of access to reliable transportation, and insufficient food access, have increasingly been recognized as factors that adversely impact health outcomes.(1) In this project, with guidance from project advisors John Holmes, PhD and Sameh Saleh, MD, I attempt to determine if there is an observed association between low food access and type 2 diabetes prevalence in the continental United States for the most recent year for which data about both food access and diabetes prevalence were available, 2015. It is possible that there is a positive association between limited food access and type 2 diabetes prevalence, as individuals in low food access areas may have little choice but to rely on convenience foods and fast foods for their nutritional intake, which may increase the risk of developing type 2 diabetes. I will use the Agency for Healthcare Research and Quality (AHRQ) Social Determinants of Health dataset for this work.

Link to final project Github repository: https://github.com/mcgreevj/BMIN503_Final_Project

2 Introduction

Type 2 diabetes is a major health problem in the United States, with significant morbidity and mortality for the at least 33 million people who suffer with this condition. Additionally, the economic costs of type 2 diabetes in the aggregate are substantial, totaling $327 billion in 2017.(2) Tragically, type 2 diabetes is largely a preventable disease that has modifiable risk factors (weight, diet, exercise).(3) Increasingly, there has been recognition that social determinants of health can serve as barriers that prevent individuals from being able to successfully modify risk factors. For example, individuals who lack access to insurance and/or reliable transportation may not be able to seek routine medical care for education, prevention, monitoring, and treatment purposes. As noted earlier, diet is a major factor in the development of type 2 diabetes. And while ideal diets to prevent type 2 diabetes have been advised, not every American necessarily has access to food to be able to consume these recommended diets.(4) This is the phenomenon of food insecurity. Food insecurity is formally defined by the U.S. Department of Agriculture (USDA) as: “the limited or uncertain availability of nutritionally adequate and safe foods, or limited or uncertain ability to acquire acceptable foods in socially acceptable ways.”(5) A concept related to food insecurity is the phenomenon of food deserts – areas that are low income and that have few or no supermarkets or other sources of healthy food options. As formally defined by the USDA, a food desert is: “A census tract that meets both low-income and low-access criteria including: 1. poverty rate is greater than or equal to 20 percent OR median family income does not exceed 80 percent statewide (rural/urban) or metro-area (urban) median family income; 2. at least 500 people or 33 percent of the population located more than 1 mile (urban) or 10 miles (rural) from the nearest supermarket or large grocery store.”(6) In the absence of options for healthy food, individuals residing in these communities may need to resort to convenience and fast foods for some or many of their meals. In doing so, they may increase the risk of developing type 2 diabetes.(7)

This project is interdisciplinary in that it combines census tract and county data with health survey data. In addition, the AHRQ SDOH database aggregates data from multiple surveys and sources. The fields and topics that contribute to understanding of this issue are multiple: public health, clinical medicine, sociology, health disparities, economics, geography, history (in that the reasons many communities have limited access to food involves historical structural decisions, including forms of structural racism such as redlining). Moreover, the findings of this project may have social justice, ethical, cultural, economic, and political implications. Based on collaboration with my mentors, I have learned the importance of familiarizing myself with data sources and data definitions, notably the Data Source Documentation guide from AHRQ for the SDOH data. This step was necessary from an exploratory standpoint to contemplate what kind of project I might do and also from a practical standpoint to understand how to properly and specifically answer my project question with the available data. I have also learned that some questions may be challenging to answer exactly as initially envisioned. For example, when considering this project at first, I had hoped to show changes over time in the association between limited food access and type 2 diabetes prevalence, perhaps including recent data from 2020 or 2021. However, after close inspection of this SDOH data set, I came to realize that not all data is available for every year of interest. Thus I narrrowed the scope of my project a bit, given that data for limited food access and diabetes prevalence intersected in 2015, which is the most recent year that both types of data are available.

The identification of an independent, positive association between low food access and type 2 diabetes prevalence would help support the case for policies and programs (both at the national and state levels) to improve food access in impacted census tracts. From a social justice standpoint, those with power in society (elected office holders, business and community leaders, members of the health care community including researchers) have a responsibility to seek ways to provide a known and effective preventive therapy (whether a vaccine or a grocery store) to those without such access currently, to prevent disease. There is also a national public health and cost dimension to this topic. People with diabetes suffer complications and morbidity related to diabetes, impairing their ability to support themselves, live independent lives, and contribute to a community’s economic activity and well-being. At the same time, the national cost of providing care for individuals with type 2 diabetes is borne by all Americans, in part because individuals living in low access food areas are more likely to have public (Medicaid, Medicare) insurance that is funded by U.S. taxpayers, if individuals in those areas have health insurance at all. Indeed, Medicaid and Medicare combined provide health coverage to approximately 36% of Americans today and that percentage continues to rise over time.(8) “Uncompensated” health care for individuals with type 2 diabetes who lack any type of insurance is also a cost borne by individual health care organizations, governments, and ultimately, taxpayers. Greater awareness of the relationships between SDOH and health outcomes, combined with a frank accounting of the costs of inaction on this subject, will hopefully spur actions that can mitigate SDOH and improve both individual and community health.

My project primarily aims to generate a descriptive analysis of the available low food access and diabetes prevalence data. I also attempt to formally characterize the relationship between low food access and diabetes prevalance using a linear regression model. Lastly, I create multivariable linear regression models to adjust for potentially confounding variables in the relationship between low food access and diabetes prevalence, such as obesity, median household income, and percentage of population that is uninsured.

3 Methods

3.1 Datasets and variables

For this project, I used the AHRQ Social Determinants of Health database (available at https://www.ahrq.gov/sdoh/data-analytics/sdoh-data.html). I reviewed the data source documentation document (https://www.ahrq.gov/sites/default/files/wysiwyg/sdoh/SDOH-Data-Sources-Documentation-v1-Final.pdf) to learn about potential variables for this project and their availability. Ultimately, I planned to use two main variables available in the SDOH database: CHR_PCT_DIABETES and FARA_PCT_LA_HALF_10_POP.

CHR_PCT_DIABETES is defined by the data source document above as the “percentage of adults with diagnosed diabetes (ages 20 and over).” Data for this variable is provided by the Centers for Disease Control and Prevention (CDC) Interactive Diabetes Atlas which in turn uses CDC Behavioral Risk Factor Surveillance System (BRFSS) data to estimate prevalence by U.S. county. Notably for purposes of this analysis that used 2015 data, starting in 2015, BRFSS included both landline and cellphone survey data to obtain diabetes prevalence data. Previously, this data had been collected exclusively via landline surveys.

Importantly, CHR_PCT_DIABETES is provided by U.S. county.

FARA_PCT_LA_HALF_10_POP is defined as the “percentage of [healthy food] low access population measured at 1/2 mile for urban areas and 10 miles for rural areas.” For this variable, low access to healthy food is defined as being far from a supermarket, supercenter, or large grocery store, so more than 1/2 mile in urban areas and more than 10 miles in rural areas for this variable.

FARA_PCT_LA_HALF_10_POP comes from the Food Access Research Atlas (FARA), created by the Economic Research Service (ERS) at the USDA. Of note, a helpful mapping tool is offered by the ERS and is available here: https://www.ers.usda.gov/data-products/food-access-research-atlas/go-to-the-atlas/.

In contrast to CHR_PCT_DIABETES, FARA_PCT_LA_HALF_10_POP is provided at the census tract level.

For further reference, the following is the ERS methodology for calculating distance to the nearest grocery store, excerpted from the ERS definitions document available at https://www.ers.usda.gov/data-products/food-access-research-atlas/documentation/:

“First, the entire country is divided into ½-kilometer (about 1/3-mile)-square grids, and data on the population are aerially allocated to these grids…Distance to the nearest supermarket is measured for each grid cell by calculating the distance between the geographic center of the ½-kilometer square grid that contains estimates of the population (number of people and other subgroup characteristics) and the center of the grid with the nearest supermarket.”

The most recent year that data for both CHR_PCT_DIABETES and FARA_PCT_LA_HALF_10_POP is available is 2015.

After consulting with my project advisors about variable choices, I loaded the appropriate datasets into R.

For CHR_PCT_DIABETES, the 2015 data (by county) was available as an Excel file here: https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fwww.ahrq.gov%2Fsites%2Fdefault%2Ffiles%2Fwysiwyg%2Fsdoh%2FSDOH_2015_COUNTY_1_0.xlsx&wdOrigin=BROWSELINK.

For FARA_PCT_LA_HALF_10_POP, the 2015 data (by tract) was available as an Excel file here: https://www.ahrq.gov/downloads/sdoh/sdoh_2015_tract_1_0.xlsx.

3.2 Retrieving, inspecting, and cleaning data

Here I am loading the package readxl that will allow me to read Excel data

library("readxl")

Read in the 2015 low access data, reported by census tract

Lowaccess_raw<- read_excel("C:\\Users\\McGreevJ\\Desktop\\BMI 5030 Data Science\\BMIN503_Final_Project\\Adjusted datasets\\data.sdoh_2015_tract_1_0.xlsx")
dim(Lowaccess_raw) #74133 x 405
## [1] 74133   405

Read in the 2015 diabetes prevalence data, reported by county

Diabetes_raw<- read_excel("C:\\Users\\McGreevJ\\Desktop\\BMI 5030 Data Science\\BMIN503_Final_Project\\Adjusted datasets\\data.SDOH_2015_COUNTY_1_0.xlsx")
## Warning: Expecting logical in RJ1671 / R1671C478: got '46123'
## Warning: Expecting logical in RJ1763 / R1763C478: got '32510'
## Warning: Expecting logical in RK1763 / R1763C479: got '41025'
## Warning: Expecting logical in RL1763 / R1763C480: got '41037'
## Warning: Expecting logical in RJ2797 / R2797C478: got '49017'
## Warning: Expecting logical in RK2797 / R2797C479: got '49019'
## Warning: Expecting logical in RL2797 / R2797C480: got '49025'
## Warning: Expecting logical in RM2797 / R2797C481: got '49055'
## Warning: Expecting logical in RJ2842 / R2842C478: got '51760'
dim(Diabetes_raw) #3234 x 979
## [1] 3234  979

Here I loaded other relevant packages and then checked for differences between the Lowaccess_raw and Diabetes_raw datasets; notably, to see if the same COUNTYFIPS codes were represented in both.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(cowplot)
library(dplyr)


setdiff(Lowaccess_raw$COUNTYFIPS, Diabetes_raw$COUNTYFIPS)
## [1] "69085"
#COUNTYFIPS 69085 (Northern Mariana Islands) is the only difference. 69085 is represented in Lowaccess_raw but not in Diabetes_raw.

Once loaded into R, I manipulated each of these raw datasets so as to limit them to the variables of interest, tracking dimensions of the dataframes as I went along.

#Define "Lowaccess" as a subset of "Lowaccess_raw" to include TRACTFIPS, COUNTYFIPS, STATEFIPS, STATE, COUNTY, REGION, ACS_TOT_POP_WT, FARA_TRACT_LILA_HALF_10, FARA_PCT_LA_HALF_10_POP

Lowaccess<- dplyr::select(Lowaccess_raw, TRACTFIPS, COUNTYFIPS, STATEFIPS, STATE, COUNTY, REGION, ACS_TOT_POP_WT, FARA_TRACT_LILA_HALF_10, FARA_PCT_LA_HALF_10_POP)
dim(Lowaccess) #74133 x 9
## [1] 74133     9
#Define "Diabetes" as a subset of "Diabetes_raw" to include COUNTYFIPS, STATEFIPS, STATE, COUNTY, REGION, ACS_TOT_POP_WT, ACS_MEDIAN_AGE, ACS_PCT_EMPLOYED, ACS_MEDIAN_HH_INC, ACS_PCT_HH_INC_10000, ACS_PCT_HH_INC_100000, ACS_PER_CAPITA_INC, ACS_PCT_POV_AIAN, ACS_MEDIAN_HOME_VALUE, ACS_PCT_MEDICAID_ANY, ACS_PCT_MEDICAID_ANY_BELOW64, ACS_PCT_MEDICARE_ONLY, ACS_PCT_PRIVATE_ANY, ACS_PCT_UNINSURED, CEN_AIAN_NH_IND, ACS_TOT_HH, ACS_AVG_HH_SIZE, ACS_PCT_CHILD_1FAM, CHR_PCT_ADULT_OBESITY, CHR_PCT_DIABETES
Diabetes<- dplyr::select(Diabetes_raw, COUNTYFIPS, STATEFIPS, STATE, COUNTY, REGION, ACS_TOT_POP_WT, ACS_MEDIAN_AGE, ACS_PCT_EMPLOYED, ACS_MEDIAN_HH_INC, ACS_PCT_HH_INC_10000, ACS_PCT_HH_INC_100000, ACS_PER_CAPITA_INC, ACS_PCT_POV_AIAN, ACS_MEDIAN_HOME_VALUE, ACS_PCT_MEDICAID_ANY, ACS_PCT_MEDICAID_ANY_BELOW64, ACS_PCT_MEDICARE_ONLY, ACS_PCT_PRIVATE_ANY, ACS_PCT_UNINSURED, CEN_AIAN_NH_IND, ACS_TOT_HH, ACS_AVG_HH_SIZE, ACS_PCT_CHILD_1FAM, CHR_PCT_ADULT_OBESITY, CHR_PCT_DIABETES)
dim(Diabetes) #3234 x 25
## [1] 3234   25

In anticipation of ultimately needing a county population value in order to calculate a weighted mean of FARA_PCT_LA_HALF_10_POP by county, I renamed ACS_TOT_POP_WT within the Diabetes dataframe to County_Pop. I then confirmed the column had been renamed and checked the dimensions of Diabetes to assure they remained unchanged.

Diabetes<- dplyr::rename(Diabetes, "County_Pop" = ACS_TOT_POP_WT)
head(Diabetes)
## # A tibble: 6 × 25
##   COUNTYFIPS STATE…¹ STATE COUNTY REGION Count…² ACS_M…³ ACS_P…⁴ ACS_M…⁵ ACS_P…⁶
##   <chr>      <chr>   <chr> <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 01001      01      Alab… Autau… South    55221    37.7    92.4   51281    6.14
## 2 01003      01      Alab… Baldw… South   195121    42.2    92.5   50254    6.36
## 3 01005      01      Alab… Barbo… South    26932    38.8    82.4   32964   13.5 
## 4 01007      01      Alab… Bibb … South    22604    38.9    91.7   38678    7.74
## 5 01009      01      Alab… Bloun… South    57710    40.7    92.3   45813    7.77
## 6 01011      01      Alab… Bullo… South    10678    39.3    82.0   31938   18.9 
## # … with 15 more variables: ACS_PCT_HH_INC_100000 <dbl>,
## #   ACS_PER_CAPITA_INC <dbl>, ACS_PCT_POV_AIAN <dbl>,
## #   ACS_MEDIAN_HOME_VALUE <dbl>, ACS_PCT_MEDICAID_ANY <dbl>,
## #   ACS_PCT_MEDICAID_ANY_BELOW64 <dbl>, ACS_PCT_MEDICARE_ONLY <dbl>,
## #   ACS_PCT_PRIVATE_ANY <dbl>, ACS_PCT_UNINSURED <dbl>, CEN_AIAN_NH_IND <chr>,
## #   ACS_TOT_HH <dbl>, ACS_AVG_HH_SIZE <dbl>, ACS_PCT_CHILD_1FAM <dbl>,
## #   CHR_PCT_ADULT_OBESITY <dbl>, CHR_PCT_DIABETES <dbl>, and abbreviated …
dim(Diabetes) #3234 x 25
## [1] 3234   25

I performed an innerjoin of Diabetes and Lowaccess by the variable that was common to both, COUNTYFIPS, recognizing that this step would only preserve rows that both dataframes had in common. I inspected the header of the joined dataframe and checked its dimensions again, noting the number of rows had been reduced from 74133 to 74132.

#Innerjoin Diabetes and Lowaccess
test_innerjoin<- inner_join(Diabetes, Lowaccess, by = "COUNTYFIPS")
head(test_innerjoin)
## # A tibble: 6 × 33
##   COUNTYFIPS STATEFIPS.x STATE.x COUNT…¹ REGIO…² Count…³ ACS_M…⁴ ACS_P…⁵ ACS_M…⁶
##   <chr>      <chr>       <chr>   <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>
## 1 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 2 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 3 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 4 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 5 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 6 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## # … with 24 more variables: ACS_PCT_HH_INC_10000 <dbl>,
## #   ACS_PCT_HH_INC_100000 <dbl>, ACS_PER_CAPITA_INC <dbl>,
## #   ACS_PCT_POV_AIAN <dbl>, ACS_MEDIAN_HOME_VALUE <dbl>,
## #   ACS_PCT_MEDICAID_ANY <dbl>, ACS_PCT_MEDICAID_ANY_BELOW64 <dbl>,
## #   ACS_PCT_MEDICARE_ONLY <dbl>, ACS_PCT_PRIVATE_ANY <dbl>,
## #   ACS_PCT_UNINSURED <dbl>, CEN_AIAN_NH_IND <chr>, ACS_TOT_HH <dbl>,
## #   ACS_AVG_HH_SIZE <dbl>, ACS_PCT_CHILD_1FAM <dbl>, …
dim(test_innerjoin) #74132 x 33
## [1] 74132    33

I expected the one COUNTYFIPS discrepancy (Northern Mariana Islands, code 69085 as noted previously) to have been excluded from this innerjoined dataframe and checked for its presence.

#Visually inspecting to see if Northern Mariana (69085) is present in test_innerjoin
View(test_innerjoin)

A search of the table above for 69085 yielded no results, so Northern Mariana had been excluded, as desired and as expected.

I then created a new variable to represent weighted population, census_tract_pop_wt, which is the tract population divided by the county population. Again, I inspected the header to verify the variable had been successfully added. I once again checked dimensions and as expected, noted the number of variables had increased by one, from 33 to 34.

test_innerjoin<- dplyr::mutate(test_innerjoin, "census_tract_pop_wt" = (ACS_TOT_POP_WT/County_Pop))
head(test_innerjoin)
## # A tibble: 6 × 34
##   COUNTYFIPS STATEFIPS.x STATE.x COUNT…¹ REGIO…² Count…³ ACS_M…⁴ ACS_P…⁵ ACS_M…⁶
##   <chr>      <chr>       <chr>   <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>
## 1 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 2 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 3 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 4 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 5 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## 6 01001      01          Alabama Autaug… South     55221    37.7    92.4   51281
## # … with 25 more variables: ACS_PCT_HH_INC_10000 <dbl>,
## #   ACS_PCT_HH_INC_100000 <dbl>, ACS_PER_CAPITA_INC <dbl>,
## #   ACS_PCT_POV_AIAN <dbl>, ACS_MEDIAN_HOME_VALUE <dbl>,
## #   ACS_PCT_MEDICAID_ANY <dbl>, ACS_PCT_MEDICAID_ANY_BELOW64 <dbl>,
## #   ACS_PCT_MEDICARE_ONLY <dbl>, ACS_PCT_PRIVATE_ANY <dbl>,
## #   ACS_PCT_UNINSURED <dbl>, CEN_AIAN_NH_IND <chr>, ACS_TOT_HH <dbl>,
## #   ACS_AVG_HH_SIZE <dbl>, ACS_PCT_CHILD_1FAM <dbl>, …
dim(test_innerjoin) #74132 x 34
## [1] 74132    34

Having visually inspected my data, I noted that some tracts had NA values for FARA_PCT_LA_HALF_10_POP. Recognizing that my weighting calculations would produce an error if I attempted to perform them using NA values, I assessed to see how many NA values were present for this variable.

sum(is.na(test_innerjoin$FARA_PCT_LA_HALF_10_POP))
## [1] 1625

Finding 1625 NA values, I removed these NA values from test_innerjoin so as to allow me to perform the weighting calculations.

test_innerjoin<- test_innerjoin %>% drop_na(FARA_PCT_LA_HALF_10_POP)

I then inspected my updated test_innerjoin dataframe dimensions to verify that all 1625 NA values had been removed. Indeed, the rows had decreased from 74132 to 72507 as expected. Visual inspection of test_innerjoin confirmed no remaining NA values for FARA_PCT_LA_HALF_10_POP.

dim(test_innerjoin) 
## [1] 72507    34
View(test_innerjoin) 

I created a new dataframe based on test_innerjoin that contained a newly-defined weighted mean variable, wm_FARA, using the weighted.mean function, which was the FARA_PCT_LA_HALF_10_POP for a tract multiplied by census_tract_pop_wt. I summarized wm_FARA and grouped results by COUNTYFIPS (one row per county). Other variables of interest for the analysis were included as well.

colnames(test_innerjoin)
##  [1] "COUNTYFIPS"                   "STATEFIPS.x"                 
##  [3] "STATE.x"                      "COUNTY.x"                    
##  [5] "REGION.x"                     "County_Pop"                  
##  [7] "ACS_MEDIAN_AGE"               "ACS_PCT_EMPLOYED"            
##  [9] "ACS_MEDIAN_HH_INC"            "ACS_PCT_HH_INC_10000"        
## [11] "ACS_PCT_HH_INC_100000"        "ACS_PER_CAPITA_INC"          
## [13] "ACS_PCT_POV_AIAN"             "ACS_MEDIAN_HOME_VALUE"       
## [15] "ACS_PCT_MEDICAID_ANY"         "ACS_PCT_MEDICAID_ANY_BELOW64"
## [17] "ACS_PCT_MEDICARE_ONLY"        "ACS_PCT_PRIVATE_ANY"         
## [19] "ACS_PCT_UNINSURED"            "CEN_AIAN_NH_IND"             
## [21] "ACS_TOT_HH"                   "ACS_AVG_HH_SIZE"             
## [23] "ACS_PCT_CHILD_1FAM"           "CHR_PCT_ADULT_OBESITY"       
## [25] "CHR_PCT_DIABETES"             "TRACTFIPS"                   
## [27] "STATEFIPS.y"                  "STATE.y"                     
## [29] "COUNTY.y"                     "REGION.y"                    
## [31] "ACS_TOT_POP_WT"               "FARA_TRACT_LILA_HALF_10"     
## [33] "FARA_PCT_LA_HALF_10_POP"      "census_tract_pop_wt"
test_innerjoin_wtmean<- test_innerjoin %>% dplyr::group_by(COUNTYFIPS, COUNTY.x, STATE.x, REGION.x, County_Pop, ACS_MEDIAN_HH_INC, ACS_PER_CAPITA_INC, ACS_PCT_POV_AIAN, ACS_MEDIAN_HOME_VALUE, ACS_PCT_UNINSURED, CHR_PCT_ADULT_OBESITY, CHR_PCT_DIABETES) %>% dplyr::summarise(wm_FARA = weighted.mean(FARA_PCT_LA_HALF_10_POP, census_tract_pop_wt)) %>% as.data.frame()
## `summarise()` has grouped output by 'COUNTYFIPS', 'COUNTY.x', 'STATE.x',
## 'REGION.x', 'County_Pop', 'ACS_MEDIAN_HH_INC', 'ACS_PER_CAPITA_INC',
## 'ACS_PCT_POV_AIAN', 'ACS_MEDIAN_HOME_VALUE', 'ACS_PCT_UNINSURED',
## 'CHR_PCT_ADULT_OBESITY'. You can override using the `.groups` argument.

Once again, I inspected this new dataframe to verify that I had created wm_FARA successfully and that each county had a single row in the dataframe. I again checked the dimensions of this new dataframe.

head(test_innerjoin_wtmean)
##   COUNTYFIPS       COUNTY.x STATE.x REGION.x County_Pop ACS_MEDIAN_HH_INC
## 1      01001 Autauga County Alabama    South      55221             51281
## 2      01003 Baldwin County Alabama    South     195121             50254
## 3      01005 Barbour County Alabama    South      26932             32964
## 4      01007    Bibb County Alabama    South      22604             38678
## 5      01009  Blount County Alabama    South      57710             45813
## 6      01011 Bullock County Alabama    South      10678             31938
##   ACS_PER_CAPITA_INC ACS_PCT_POV_AIAN ACS_MEDIAN_HOME_VALUE ACS_PCT_UNINSURED
## 1              24974            52.20                141300             10.12
## 2              27317             9.15                169300             12.96
## 3              16824             0.00                 92200             15.51
## 4              18431             4.65                102700              9.66
## 5              20532             6.60                119800             11.64
## 6              17580             0.00                 68600             18.09
##   CHR_PCT_ADULT_OBESITY CHR_PCT_DIABETES  wm_FARA
## 1                  37.5             14.2 54.81617
## 2                  31.0             11.3 40.21763
## 3                  44.3             18.0 31.32199
## 4                  37.8             14.9  1.18436
## 5                  34.4             14.3 13.44140
## 6                  39.4             20.0 70.79069
dim(test_innerjoin_wtmean) #3140 x 13
## [1] 3140   13

I noted that while the earlier county-level dataframe (Diabetes) had 3234 rows, test_innerjoin_wtmean only had 3140 rows. I went about an analysis to understand why there was a difference in row counts starting with setdiff.

setdiff(Diabetes$COUNTYFIPS, test_innerjoin_wtmean$COUNTYFIPS)
##  [1] "02158" "46102" "60000" "60010" "60020" "60030" "60040" "60050" "66010"
## [10] "69000" "69100" "69110" "69120" "72001" "72003" "72005" "72007" "72009"
## [19] "72011" "72013" "72015" "72017" "72019" "72021" "72023" "72025" "72027"
## [28] "72029" "72031" "72033" "72035" "72037" "72039" "72041" "72043" "72045"
## [37] "72047" "72049" "72051" "72053" "72054" "72055" "72057" "72059" "72061"
## [46] "72063" "72065" "72067" "72069" "72071" "72073" "72075" "72077" "72079"
## [55] "72081" "72083" "72085" "72087" "72089" "72091" "72093" "72095" "72097"
## [64] "72099" "72101" "72103" "72105" "72107" "72109" "72111" "72113" "72115"
## [73] "72117" "72119" "72121" "72123" "72125" "72127" "72129" "72131" "72133"
## [82] "72135" "72137" "72139" "72141" "72143" "72145" "72147" "72149" "72151"
## [91] "72153" "78010" "78020" "78030"

There were 94 counties in Diabetes that were not in test_innerjoin_wtmean. Using “View(Diabetes)” I visually inspected the Diabetes table again to look for a pattern among these 94 counties in question.

These counties appeared to be in Alaska and U.S. territories. Examples included: 02158 (Kusilvak Census Area, AK), 60010 (Am Samoa), 72151 (Puerto Rico), 78030 (US Virgin Islands). Excluding these counties from test_innerjoin_wtmean was acceptable given that I did not have the ability to plot choropleths (see counties RDS file below) with these non-continental U.S. counties.

I next began preparing for the exploratory analysis by loading various necessary packages first and then loading county geography data.

library(rgdal)
## Warning: package 'rgdal' was built under R version 4.2.2
## Loading required package: sp
## Please note that rgdal will be retired during 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## See https://r-spatial.org/r/2022/04/12/evolution.html and https://github.com/r-spatial/evolution
## rgdal: version: 1.6-2, (SVN revision 1183)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.5.2, released 2022/09/02
## Path to GDAL shared files: C:/Users/McGreevJ/AppData/Local/Programs/R/R-4.2.1/library/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 8.2.1, January 1st, 2022, [PJ_VERSION: 821]
## Path to PROJ shared files: C:/Users/McGreevJ/AppData/Local/Programs/R/R-4.2.1/library/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.5-1
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.2.2
library(ggplot2)
library(RColorBrewer)
library(plyr)
## Warning: package 'plyr' was built under R version 4.2.2
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
library(sf)
## Warning: package 'sf' was built under R version 4.2.2
## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(tidycensus)
## Warning: package 'tidycensus' was built under R version 4.2.2
library(ggsn)
## Warning: package 'ggsn' was built under R version 4.2.2
## Loading required package: grid
counties<- readRDS(gzcon(url("https://raw.githubusercontent.com/HimesGroup/BMIN503/master/DataFiles/uscounties_2010.rds")))

st_crs(counties) <- st_crs(counties)
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
#Using base plot function to check that counties contains polygon data: 
plot(counties)

#Using ggplot to check that counties contains polygon data:
ggplot() +
    geom_sf(data = counties)

Here I began the process of joining test_innerjoin_wtmean and counties but first needed to create a common variable by which to join. I converted State to a 2 digit number in counties, then concatenated State and County into a 5 digit number, a mutated variable in counties called “COUNTYFIPS” to correspond to “COUNTYFIPS” in the combined data set, test_innerjoin_wtmean.

counties$STATE<- str_pad(counties$STATE, width=2, side="left", pad="0")
head(counties)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.13999 ymin: 31.19518 xmax: -85.12342 ymax: 34.8922
## Geodetic CRS:  WGS 84
##           GEO_ID STATE COUNTY     NAME   LSAD CENSUSAREA
## 1 0500000US01001    01    001  Autauga County    594.436
## 2 0500000US01009    01    009   Blount County    644.776
## 3 0500000US01017    01    017 Chambers County    596.531
## 4 0500000US01021    01    021  Chilton County    692.854
## 5 0500000US01033    01    033  Colbert County    592.619
## 6 0500000US01045    01    045     Dale County    561.150
##                         geometry
## 1 MULTIPOLYGON (((-86.49677 3...
## 2 MULTIPOLYGON (((-86.5778 33...
## 3 MULTIPOLYGON (((-85.18413 3...
## 4 MULTIPOLYGON (((-86.51734 3...
## 5 MULTIPOLYGON (((-88.13999 3...
## 6 MULTIPOLYGON (((-85.41644 3...
counties<- counties %>% dplyr::mutate(COUNTYFIPS = paste(STATE, COUNTY, sep = ""))
head(counties)
## Simple feature collection with 6 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.13999 ymin: 31.19518 xmax: -85.12342 ymax: 34.8922
## Geodetic CRS:  WGS 84
##           GEO_ID STATE COUNTY     NAME   LSAD CENSUSAREA
## 1 0500000US01001    01    001  Autauga County    594.436
## 2 0500000US01009    01    009   Blount County    644.776
## 3 0500000US01017    01    017 Chambers County    596.531
## 4 0500000US01021    01    021  Chilton County    692.854
## 5 0500000US01033    01    033  Colbert County    592.619
## 6 0500000US01045    01    045     Dale County    561.150
##                         geometry COUNTYFIPS
## 1 MULTIPOLYGON (((-86.49677 3...      01001
## 2 MULTIPOLYGON (((-86.5778 33...      01009
## 3 MULTIPOLYGON (((-85.18413 3...      01017
## 4 MULTIPOLYGON (((-86.51734 3...      01021
## 5 MULTIPOLYGON (((-88.13999 3...      01033
## 6 MULTIPOLYGON (((-85.41644 3...      01045

Also before joining test_innerjoin_wtmean and counties, I inspected both to verify that the number of COUNTYFIPS codes were the same in each. This process yielded the following results:

counties %>% summarise(n_distinct(COUNTYFIPS)) #3109
##   n_distinct(COUNTYFIPS)
## 1                   3109
test_innerjoin_wtmean %>% summarise(n_distinct(COUNTYFIPS)) #3140
##   n_distinct(COUNTYFIPS)
## 1                   3140

So 31 rows appear to be different between counties and test_innerjoin_wtmean in terms of COUNTYFIPS values

setdiff(counties$COUNTYFIPS, test_innerjoin_wtmean$COUNTYFIPS)
## [1] "46113" "51515"

Yields 2 COUNTYFIPS, 46113 and 51515 that are different, both present in counties but not in test_innerjoin_wtmean

These 2 COUNTYFIPS are:

Shannon County, SD (FIPS code = 46113). After investigation, this county was renamed Oglala Lakota County and assigned a new FIPS code (46102) effective in 2014.

Bedford City, VA (FIPS code=51515). After investigation, I discovered that in 2013, Bedford City, an independent city, merged with Bedford County (FIPS code=51019). Apparently, Bedford City category codes are the same as those for Bedford County.

However, this prompted me to ask “what about the other rows that are different? What are those rows?”

Using the alternate sequence for setdiff, with test_innerjoin_wt mean first:

setdiff(test_innerjoin_wtmean$COUNTYFIPS, counties$COUNTYFIPS)
##  [1] "02013" "02016" "02020" "02050" "02060" "02068" "02070" "02090" "02100"
## [10] "02105" "02110" "02122" "02130" "02150" "02164" "02170" "02180" "02185"
## [19] "02188" "02195" "02198" "02220" "02230" "02240" "02261" "02275" "02282"
## [28] "02290" "15001" "15003" "15005" "15007" "15009"

Therefore there were 33 differences, which represent COUNTYFIPS that are present in test_innerjoin_wtmean but not present in counties.

These differences are for Alaska (State code 02) and Hawaii (State code 15). These states are not represented in counties.

Based on this analysis, I concluded that an innerjoin of counties and test_innerjoin_wtmean would likely omit the Hawaii and Alaska data that is in test_innerjoin_wtmean. For purposes of this project, this omission would be acceptable as my choropleth creation tools are presently limited to the continental United States.

Given the conclusion above, I felt it was safe to proceed with an innerjoin of counties and test_innerjoin_wtmean by COUNTYFIPS, to create a new dataframe, test_counties_inner.

test_counties_inner<- inner_join(counties, test_innerjoin_wtmean, by = 'COUNTYFIPS')

I inspected test_counties_inner to verify that Shannon County, Bedford City, Hawaii, and Alaska rows (as described above) had been removed as expected.

str(test_counties_inner) 
## Classes 'sf' and 'data.frame':   3107 obs. of  20 variables:
##  $ GEO_ID               : Factor w/ 3221 levels "0500000US01001",..: 1 5 9 11 17 23 26 33 40 42 ...
##  $ STATE                : chr  "01" "01" "01" "01" ...
##  $ COUNTY               : Factor w/ 325 levels "001","003","005",..: 1 6 13 16 23 30 34 43 53 55 ...
##  $ NAME                 : Factor w/ 1909 levels "Abbeville","Acadia",..: 89 174 309 341 387 460 567 730 986 1009 ...
##  $ LSAD                 : Factor w/ 8 levels "Borough","CA",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ CENSUSAREA           : num  594 645 597 693 593 ...
##  $ COUNTYFIPS           : chr  "01001" "01009" "01017" "01021" ...
##  $ COUNTY.x             : chr  "Autauga County" "Blount County" "Chambers County" "Chilton County" ...
##  $ STATE.x              : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ REGION.x             : chr  "South" "South" "South" "South" ...
##  $ County_Pop           : num  55221 57710 34079 43819 54444 ...
##  $ ACS_MEDIAN_HH_INC    : num  51281 45813 34177 41627 40576 ...
##  $ ACS_PER_CAPITA_INC   : num  24974 20532 21071 21399 22546 ...
##  $ ACS_PCT_POV_AIAN     : num  52.2 6.6 36.84 6.33 18.24 ...
##  $ ACS_MEDIAN_HOME_VALUE: num  141300 119800 80800 100100 99400 ...
##  $ ACS_PCT_UNINSURED    : num  10.1 11.6 13.9 16 11.1 ...
##  $ CHR_PCT_ADULT_OBESITY: num  37.5 34.4 40.3 35.3 32.5 37.3 36.1 42 35 33.6 ...
##  $ CHR_PCT_DIABETES     : num  14.2 14.3 17 14.7 17.6 15.6 14.6 16.1 14.9 12.4 ...
##  $ wm_FARA              : num  54.8 13.4 30.7 13.9 52.8 ...
##  $ geometry             :sfc_MULTIPOLYGON of length 3107; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:10, 1:2] -86.5 -86.7 -86.8 -86.9 -86.9 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:19] "GEO_ID" "STATE" "COUNTY" "NAME" ...
View(test_counties_inner)

Inspection revealed 3107 observations of 20 variables and confirmed that Hawaii and Alaska data, as well as Shannon County, SD and Bedford City, VA data, had been removed.

4 Results

4.1 Results of exploratory analysis

I began my exploratory analysis, first by inspecting column names in test_counties_inner to verify that all variables needed for this analysis were present.

colnames(test_counties_inner)
##  [1] "GEO_ID"                "STATE"                 "COUNTY"               
##  [4] "NAME"                  "LSAD"                  "CENSUSAREA"           
##  [7] "COUNTYFIPS"            "COUNTY.x"              "STATE.x"              
## [10] "REGION.x"              "County_Pop"            "ACS_MEDIAN_HH_INC"    
## [13] "ACS_PER_CAPITA_INC"    "ACS_PCT_POV_AIAN"      "ACS_MEDIAN_HOME_VALUE"
## [16] "ACS_PCT_UNINSURED"     "CHR_PCT_ADULT_OBESITY" "CHR_PCT_DIABETES"     
## [19] "wm_FARA"               "geometry"

Before beginning the exploratory analysis, I also double-checked to verify there were no counties that were missing low food access (wm_FARA) values. I determined that in fact, there were no counties missing wm_FARA values.

y2<-test_counties_inner$wm_FARA  
countmissinglowaccess<-length(which(is.na(y2)))
countmissinglowaccess 
## [1] 0

Also before beginning the exploratory analysis, I verified that no counties were missing values for diabetes prevalence.

y_dm<-test_counties_inner$CHR_PCT_DIABETES  
countmissingdm<-length(which(is.na(y_dm)))
countmissingdm 
## [1] 0

I began the exploratory analysis by generating summary statistics for the low access data, notably a mean and a median low access (wm_FARA) percentage across continental U.S. counties.

Mean low access percentage

mean_LA<- test_counties_inner %>% dplyr::summarise(Mean_LA_County = mean(wm_FARA, na.rm = TRUE))
mean_LA 
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7332 ymin: 24.5447 xmax: -66.9499 ymax: 49.38436
## Geodetic CRS:  WGS 84
##   Mean_LA_County                       geometry
## 1        38.1614 MULTIPOLYGON (((-85.56778 4...

Therefore mean county low access percentage across continental U.S. counties was 38.16%.

Median low access percentage

median_LA<- test_counties_inner %>% dplyr::summarise(Median_LA_County = median(wm_FARA, na.rm = TRUE))
median_LA 
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7332 ymin: 24.5447 xmax: -66.9499 ymax: 49.38436
## Geodetic CRS:  WGS 84
##   Median_LA_County                       geometry
## 1         36.44304 MULTIPOLYGON (((-85.56778 4...

Therefore median county low access percentage across continental U.S. counties was 36.44%.

Next I calculated the number of counties that had 100% of their population with low access to food and identified them.

y<-test_counties_inner$wm_FARA  

counttoplowaccess<-length(which(y == 100))
counttoplowaccess 
## [1] 55

Therefore 55 counties had 100% of the population with low access to food.

Here are the identities of those 100% low access counties.

Lowest_access_counties<- test_counties_inner %>% dplyr::select(COUNTYFIPS, REGION.x, STATE.x, COUNTY.x, wm_FARA) %>% slice_max(wm_FARA) 
Lowest_access_counties
## Simple feature collection with 55 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -120.4952 ymin: 29.6235 xmax: -79.31228 ymax: 48.99902
## Geodetic CRS:  WGS 84
## First 10 features:
##    COUNTYFIPS REGION.x  STATE.x             COUNTY.x wm_FARA
## 1       20075  Midwest   Kansas      Hamilton County     100
## 2       20203  Midwest   Kansas       Wichita County     100
## 3       30037     West  Montana Golden Valley County     100
## 4       31183  Midwest Nebraska       Wheeler County     100
## 5       30059     West  Montana       Meagher County     100
## 6       30069     West  Montana     Petroleum County     100
## 7       30107     West  Montana     Wheatland County     100
## 8       31009  Midwest Nebraska        Blaine County     100
## 9       31085  Midwest Nebraska         Hayes County     100
## 10      31117  Midwest Nebraska     McPherson County     100
##                          geometry
## 1  MULTIPOLYGON (((-101.5675 3...
## 2  MULTIPOLYGON (((-101.5671 3...
## 3  MULTIPOLYGON (((-108.7801 4...
## 4  MULTIPOLYGON (((-98.29576 4...
## 5  MULTIPOLYGON (((-110.2819 4...
## 6  MULTIPOLYGON (((-108.6309 4...
## 7  MULTIPOLYGON (((-109.6543 4...
## 8  MULTIPOLYGON (((-99.68696 4...
## 9  MULTIPOLYGON (((-100.7774 4...
## 10 MULTIPOLYGON (((-101.2697 4...
View(Lowest_access_counties)

I then calculated the counties where greater than 75% of the population had low food access and identified these counties.

yseventyfive<- test_counties_inner$wm_FARA
countseventyfive<- length(which(yseventyfive > 75))
countseventyfive 
## [1] 165

Therefore in 165 counties greater than 75% of the population had low food access.

Here are those at least 75% of the population with low access counties identified.

Topquartile<- test_counties_inner %>% dplyr::select(COUNTYFIPS, REGION.x, STATE.x, COUNTY.x, wm_FARA) %>% filter(wm_FARA > 75)
Topquartile
## Simple feature collection with 165 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.7279 ymin: 26.06684 xmax: -71.2248 ymax: 49.00027
## Geodetic CRS:  WGS 84
## First 10 features:
##    COUNTYFIPS REGION.x  STATE.x             COUNTY.x   wm_FARA
## 1       08014     West Colorado    Broomfield County  79.02029
## 2       12111    South  Florida     St. Lucie County  78.43846
## 3       13059    South  Georgia        Clarke County  76.37750
## 4       08111     West Colorado      San Juan County  99.24000
## 5       20049  Midwest   Kansas           Elk County  99.36000
## 6       20075  Midwest   Kansas      Hamilton County 100.00000
## 7       20203  Midwest   Kansas       Wichita County 100.00000
## 8       30037     West  Montana Golden Valley County 100.00000
## 9       31163  Midwest Nebraska       Sherman County  83.43000
## 10      31171  Midwest Nebraska        Thomas County  99.94000
##                          geometry
## 1  MULTIPOLYGON (((-105.1473 3...
## 2  MULTIPOLYGON (((-80.67786 2...
## 3  MULTIPOLYGON (((-83.36003 3...
## 4  MULTIPOLYGON (((-107.7383 3...
## 5  MULTIPOLYGON (((-96.52569 3...
## 6  MULTIPOLYGON (((-101.5675 3...
## 7  MULTIPOLYGON (((-101.5671 3...
## 8  MULTIPOLYGON (((-108.7801 4...
## 9  MULTIPOLYGON (((-98.74853 4...
## 10 MULTIPOLYGON (((-100.8461 4...
View(Topquartile)

Using the 75% of population with low access threshold above, I then determined which regions had the lowest food access.

Topregions<- Topquartile %>% dplyr::summarise(count(REGION.x)) 
Topregions
## Simple feature collection with 4 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.7279 ymin: 26.06684 xmax: -71.2248 ymax: 49.00027
## Geodetic CRS:  WGS 84
##           x freq                       geometry
## 1   Midwest   67 MULTIPOLYGON (((-101.2277 4...
## 2 Northeast    1 MULTIPOLYGON (((-101.2277 4...
## 3     South   55 MULTIPOLYGON (((-101.2277 4...
## 4      West   42 MULTIPOLYGON (((-101.2277 4...
Topregions_summary<- dplyr::arrange(Topregions, desc(freq))
Topregions_summary
## Simple feature collection with 4 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.7279 ymin: 26.06684 xmax: -71.2248 ymax: 49.00027
## Geodetic CRS:  WGS 84
##           x freq                       geometry
## 1   Midwest   67 MULTIPOLYGON (((-101.2277 4...
## 2     South   55 MULTIPOLYGON (((-101.2277 4...
## 3      West   42 MULTIPOLYGON (((-101.2277 4...
## 4 Northeast    1 MULTIPOLYGON (((-101.2277 4...

As noted in the table, ranked by number of counties with >75% of population with low food access: Midwest had 67 counties; South had 55 counties; West had 42 counties, Northeast had 1 county, for 165 counties total.

I calculated how many counties have 0% of their population with low food access and identified them.

y1<-test_counties_inner$wm_FARA  
countleastlowaccess<-length(which(y1 == 0))
countleastlowaccess 
## [1] 38

Therefore 38 counties have 0% of the population with low access to food.

These 38 counties where 0% of the population had low access to food were identified.

Highest_access_counties<- test_counties_inner %>% dplyr::select(COUNTYFIPS, REGION.x, STATE.x, COUNTY.x, wm_FARA) %>% slice_min(wm_FARA) 
Highest_access_counties
## Simple feature collection with 38 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -105.6903 ymin: 31.43369 xmax: -76.24528 ymax: 45.21074
## Geodetic CRS:  WGS 84
## First 10 features:
##    COUNTYFIPS REGION.x        STATE.x          COUNTY.x wm_FARA
## 1       08047     West       Colorado     Gilpin County       0
## 2       17035  Midwest       Illinois Cumberland County       0
## 3       18115  Midwest        Indiana       Ohio County       0
## 4       21077    South       Kentucky   Gallatin County       0
## 5       37103    South North Carolina      Jones County       0
## 6       47057    South      Tennessee   Grainger County       0
## 7       48379    South          Texas      Rains County       0
## 8       48425    South          Texas  Somervell County       0
## 9       13109    South        Georgia      Evans County       0
## 10      13119    South        Georgia   Franklin County       0
##                          geometry
## 1  MULTIPOLYGON (((-105.3978 3...
## 2  MULTIPOLYGON (((-88.01212 3...
## 3  MULTIPOLYGON (((-84.84945 3...
## 4  MULTIPOLYGON (((-84.66011 3...
## 5  MULTIPOLYGON (((-77.47372 3...
## 6  MULTIPOLYGON (((-83.66746 3...
## 7  MULTIPOLYGON (((-95.66539 3...
## 8  MULTIPOLYGON (((-97.86486 3...
## 9  MULTIPOLYGON (((-81.96907 3...
## 10 MULTIPOLYGON (((-83.35527 3...
View(Highest_access_counties)

I continued the exploratory analysis by generating summary statistics about the diabetes prevalence percentage (CHR_PCT_DIABETES) data.

Mean percentage prevalence of population with diabetes across U.S. counties.

mean_DM<- test_counties_inner %>% dplyr::summarise(Mean_DM_County = mean(CHR_PCT_DIABETES, na.rm = TRUE))
mean_DM 
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7332 ymin: 24.5447 xmax: -66.9499 ymax: 49.38436
## Geodetic CRS:  WGS 84
##   Mean_DM_County                       geometry
## 1        11.6589 MULTIPOLYGON (((-85.56778 4...

Therefore across continental U.S. counties, the mean percentage of the population with diabetes was 11.66%.

Median percentage prevalence of population with diabetes across U.S. counties.

median_DM<- test_counties_inner %>% dplyr::summarise(Median_DM_County = median(CHR_PCT_DIABETES, na.rm = TRUE))
median_DM 
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7332 ymin: 24.5447 xmax: -66.9499 ymax: 49.38436
## Geodetic CRS:  WGS 84
##   Median_DM_County                       geometry
## 1             11.5 MULTIPOLYGON (((-85.56778 4...

Therefore across continental U.S. counties, median percentage of population with diabetes was 11.5%.

I identified the top 100 counties in terms of greatest percentage of population with diabetes in 2015.

Diabetes_high<- test_counties_inner %>% dplyr::select(COUNTYFIPS, REGION.x, STATE.x, COUNTY.x, CHR_PCT_DIABETES) %>% slice_max(CHR_PCT_DIABETES, n = 100)
View(Diabetes_high) 

Note that the search for the top 100 actually returned 109 counties given that a number of counties were tied in their diabetes prevalence percentages.

I determined what regions were represented by these top 109 counties with the greatest percentage of population with diabetes.

Topregions_DM<- Diabetes_high %>% dplyr::summarize(count(REGION.x) %>% dplyr::arrange(desc(freq)))
View(Topregions_DM)

Therefore 105 of the top 109 counties in terms of greatest percentages of population with diabetes were in the South. 4 were in the Midwest.

I determined the 100 counties with the lowest percentage of their population with diabetes.

Diabetes_low<- test_counties_inner %>% dplyr::select(COUNTYFIPS, REGION.x, STATE.x, COUNTY.x, CHR_PCT_DIABETES) %>% slice_min(CHR_PCT_DIABETES, n = 100)
View(Diabetes_low)  

Note that the search for the 100 counties with the lowest percentage of their populations with diabetes actually returned 111 entries given a number of counties were tied in their population percentages with diabetes.

I determined what regions were represented by these 111 counties with the lowest percentage of population with diabetes.

Bottomregions_DM<- Diabetes_low %>% dplyr::summarize(count(REGION.x) %>% dplyr::arrange(desc(freq)))
View(Bottomregions_DM)

Therefore, in rank order by lowest percentage of population with diabetes among these 111 counties: West (69 counties), Midwest (25 counties), Northeast (9 counties), South (8 counties)

Here I prepared for another component of the exploratory analysis, choropleth and leaflet creation.

library(RColorBrewer)

my_theme <- function() {
  theme_minimal() +                                  
  theme(axis.line = element_blank(),                 
        axis.text = element_blank(),                 
        axis.title = element_blank(),
        panel.grid = element_line(color = "white"),  
        legend.key.size = unit(0.5, "cm"),          
        legend.text = element_text(size = 10),       
        legend.title = element_text(size = 12),
        plot.title = element_text(size = 16))      
}  
myPalette <- colorRampPalette(brewer.pal(9, "YlOrRd")) 


library(leaflet)
pal_fun4 <- colorNumeric("inferno", NULL)   # inferno from viridis

I generated a low food access choropleth.

choro_lowaccess<- ggplot() +
  geom_sf(data = test_counties_inner, aes(fill = wm_FARA), lwd = 0) +
  my_theme() +
  ggtitle("Percent of population with low food access, by U.S. county, 2015") + 
  scale_fill_continuous(type = "viridis", guide = "colorbar") +
  north(x.min = -75.28031, y.min = 39.86747,    # add north bar with north()
        x.max = -74.95575, y.max = 40.13793,    # set map boundaries with x.min, y.min, etc..      
        symbol=12,                              # select north arrow symbol
        anchor = c(x = -75, y = 39.93)) +       # set north bar location
  scalebar(x.min = -75.28031, y.min = 39.86747, # add scalebar with scalebar()
           x.max = -74.95575, y.max = 40.13793, # set map boundaries
           dist = 5, dist_unit = 'km',          # set scalebar segment length = 5km
           transform = TRUE,                    # TRUE for decimal degree coordinates
           model = "WGS84")                      # specify CR

choro_lowaccess

I generated a diabetes prevalence percentage choropleth.

choro_diabetes<- ggplot() +
  geom_sf(data = test_counties_inner, aes(fill = CHR_PCT_DIABETES), lwd = 0) +
  my_theme() +
  ggtitle("Percent of population with diabetes, by U.S. county, 2015") +
  scale_fill_continuous(type = "viridis", guide = "colorbar") +
  north(x.min = -75.28031, y.min = 39.86747,    # add north bar with north()
        x.max = -74.95575, y.max = 40.13793,    # set map boundaries with x.min, y.min, etc..      
        symbol=12,                              # select north arrow symbol
        anchor = c(x = -75, y = 39.93)) +       # set north bar location
  scalebar(x.min = -75.28031, y.min = 39.86747, # add scalebar with scalebar()
           x.max = -74.95575, y.max = 40.13793, # set map boundaries
           dist = 5, dist_unit = 'km',          # set scalebar segment length = 5km
           transform = TRUE,                    # TRUE for decimal degree coordinates
           model = "WGS84")                      # specify CR

choro_diabetes

I created a leaflet for low food access.

#Popup message - low access
pu_message1 <- paste0(test_counties_inner$COUNTY.x, ", ", test_counties_inner$STATE.x, "<br>Low food access percentage 2015: ", round(test_counties_inner$wm_FARA, 1), "%")

#Interactive map with legend and scalebar
leaflet_lowaccess<-leaflet(test_counties_inner) %>%
  addPolygons(fillColor = ~pal_fun4(wm_FARA),  
  popup = pu_message1) %>%             
  addTiles() %>% addLegend("bottomright", pal=pal_fun4,        values = ~wm_FARA,
            title = '2015 Low food access percentages by U.S. county',
            opacity = 1) %>% addScaleBar() 

leaflet_lowaccess

I created a leaflet for diabetes prevalence.

#Popup message - diabetes
pu_message2 <- paste0(test_counties_inner$COUNTY.x, ", ", test_counties_inner$STATE.x, "<br>Diabetes prevalence 2015: ", round(test_counties_inner$CHR_PCT_DIABETES, 1), "%")

#Interactive map with legend and scalebar
leaflet_diabetes<- leaflet(test_counties_inner) %>%
  addPolygons(fillColor = ~pal_fun4(CHR_PCT_DIABETES),   
  popup = pu_message2) %>%             
  addTiles() %>% addLegend("bottomright", pal=pal_fun4,        values = ~CHR_PCT_DIABETES,
            title = '2015 Diabetes prevalence rates by U.S. county',
            opacity = 1) %>% addScaleBar() 

leaflet_diabetes

Next, I generated a scatterplot with smooth line showing the relationship between low food access (wm_FARA) and diabetes prevalence percentage (CHR_PCT_DIABETES).

library(modelsummary)
## 
## Attaching package: 'modelsummary'
## The following object is masked from 'package:tidycensus':
## 
##     get_estimates
ggplot(test_counties_inner, aes(x = wm_FARA, y = CHR_PCT_DIABETES)) +
    geom_point(color = "darkseagreen3") + 
    geom_smooth(method = "lm", color = "green4") + ggtitle("Scatterplot: Low food access vs. Diabetes prevalence")
## `geom_smooth()` using formula 'y ~ x'

I calculated the Pearson correlation coefficient for the above relationship, which showed a mild to moderate negative relationship between wm_FARA and CHR_PCT_DIABETES.

cor(test_counties_inner$wm_FARA, test_counties_inner$CHR_PCT_DIABETES, use = "complete.obs") 
## [1] -0.3060672

4.2 Results of unadjusted linear regression model analysis

I then created an unadjusted linear regression model to further assess the relationship between wm_FARA and CHR_PCT_DIABETES.

LA_DM_lm_fit <- lm(CHR_PCT_DIABETES ~ wm_FARA, data = test_counties_inner)
LA_DM_lm_fit
## 
## Call:
## lm(formula = CHR_PCT_DIABETES ~ wm_FARA, data = test_counties_inner)
## 
## Coefficients:
## (Intercept)      wm_FARA  
##    12.93230     -0.03337

I then summarized this linear model, LA_DM_lm_fit. Findings indicate a very small negative relationship that was statistically significant between wm_FARA and CHR_PCT_DIABETES.

summary_LA_DM_lm_fit <- summary.lm(LA_DM_lm_fit) #Summary of results
summary_LA_DM_lm_fit
## 
## Call:
## lm(formula = CHR_PCT_DIABETES ~ wm_FARA, data = test_counties_inner)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.7474 -1.7094 -0.0926  1.5925  9.4299 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 12.932301   0.083690  154.53   <2e-16 ***
## wm_FARA     -0.033369   0.001863  -17.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.462 on 3105 degrees of freedom
## Multiple R-squared:  0.09368,    Adjusted R-squared:  0.09339 
## F-statistic: 320.9 on 1 and 3105 DF,  p-value: < 2.2e-16

4.3 Results of other variable exploration and adjusted, multivariable linear regression model analysis

Next, I explored potentially confounding variables in the relationship between wm_FARA and CHR_PCT_DIABETES. I started by creating scatterplots showing the relationship between four variables - median household income, per capita income, obesity percentage by county, and percentage of a county population that was uninsured - with diabetes prevalence. I also created linear models to assess the relationship between each of these variables and diabetes prevalence.

Median household income

plot_medhhincome<-ggplot(test_counties_inner, aes(x = ACS_MEDIAN_HH_INC, y = CHR_PCT_DIABETES)) +
  geom_point(color = "sienna4") + ggtitle("Med. HH income r/t to diabetes")
summary((lm(CHR_PCT_DIABETES ~ ACS_MEDIAN_HH_INC, data = test_counties_inner)))
## 
## Call:
## lm(formula = CHR_PCT_DIABETES ~ ACS_MEDIAN_HH_INC, data = test_counties_inner)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4296 -1.3736 -0.0069  1.3735  7.4564 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.729e+01  1.524e-01  113.42   <2e-16 ***
## ACS_MEDIAN_HH_INC -1.207e-04  3.163e-06  -38.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.134 on 3104 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3193, Adjusted R-squared:  0.3191 
## F-statistic:  1456 on 1 and 3104 DF,  p-value: < 2.2e-16
plot_medhhincome
## Warning: Removed 1 rows containing missing values (geom_point).

Per capita income

plot_percapincome<-ggplot(test_counties_inner, aes(x = ACS_PER_CAPITA_INC, y = CHR_PCT_DIABETES)) +
  geom_point(color = "sienna4") + ggtitle("Per capita income r/t diabetes")
summary((lm(CHR_PCT_DIABETES ~ ACS_PER_CAPITA_INC, data = test_counties_inner)))
## 
## Call:
## lm(formula = CHR_PCT_DIABETES ~ ACS_PER_CAPITA_INC, data = test_counties_inner)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8249 -1.4033  0.0386  1.3774  8.5143 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.752e+01  1.693e-01  103.47   <2e-16 ***
## ACS_PER_CAPITA_INC -2.413e-04  6.786e-06  -35.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.18 on 3105 degrees of freedom
## Multiple R-squared:  0.2895, Adjusted R-squared:  0.2892 
## F-statistic:  1265 on 1 and 3105 DF,  p-value: < 2.2e-16
plot_percapincome

Obesity

plot_obesity<-ggplot(test_counties_inner, aes(x = CHR_PCT_ADULT_OBESITY, y = CHR_PCT_DIABETES)) +
  geom_point(color = "sienna4") +
  ggtitle("Obesity pct r/t to diabetes") 
summary((lm(CHR_PCT_DIABETES ~ CHR_PCT_ADULT_OBESITY, data = test_counties_inner)))
## 
## Call:
## lm(formula = CHR_PCT_DIABETES ~ CHR_PCT_ADULT_OBESITY, data = test_counties_inner)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1775 -1.2811 -0.0812  1.2155  7.1327 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.507953   0.243063   -2.09   0.0367 *  
## CHR_PCT_ADULT_OBESITY  0.379292   0.007501   50.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.915 on 3105 degrees of freedom
## Multiple R-squared:  0.4516, Adjusted R-squared:  0.4514 
## F-statistic:  2557 on 1 and 3105 DF,  p-value: < 2.2e-16
plot_obesity

Uninsured status percentage

plot_uninsured<-ggplot(test_counties_inner, aes(x = ACS_PCT_UNINSURED, y = CHR_PCT_DIABETES)) +
  geom_point(color = "sienna4") + ggtitle("Uninsured pct r/t to diabetes")
summary((lm(CHR_PCT_DIABETES ~ ACS_PCT_UNINSURED, data = test_counties_inner)))
## 
## Call:
## lm(formula = CHR_PCT_DIABETES ~ ACS_PCT_UNINSURED, data = test_counties_inner)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9911 -1.7029 -0.0521  1.6202  8.8322 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       10.077338   0.122469   82.28   <2e-16 ***
## ACS_PCT_UNINSURED  0.118761   0.008552   13.89   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.51 on 3105 degrees of freedom
## Multiple R-squared:  0.05847,    Adjusted R-squared:  0.05817 
## F-statistic: 192.8 on 1 and 3105 DF,  p-value: < 2.2e-16
plot_uninsured

Of these individual linear models, as shown above, the variable with the strongest predictive relationship with diabetes prevalence was obesity, which has the greatest magnitude coefficient at 0.38, a significant relationship with diabetes prevalence, and an adjusted R2 value of 0.45.

To see all of the above plots at a glance, I created a grid of the above plots.

plot_grid(plot_medhhincome, plot_percapincome, plot_obesity, plot_uninsured, labels = "AUTO")
## Warning: Removed 1 rows containing missing values (geom_point).

In considering a multivariable linear model to assess the relationship between wm_FARA and CHR_PCT_DIABETES, I needed to give consideration to the likely collinearity of some of the above variables. As such, for example, I limited the multivariable linear models to exclude potentially confounding variables that were likely to be collinear, such as median household income and per capita income.

In this linear model, I adjusted for obesity and uninsured status percentage.

lm_2<- lm(CHR_PCT_DIABETES ~ wm_FARA + CHR_PCT_ADULT_OBESITY + ACS_PCT_UNINSURED, data = test_counties_inner)
summary(lm_2)
## 
## Call:
## lm(formula = CHR_PCT_DIABETES ~ wm_FARA + CHR_PCT_ADULT_OBESITY + 
##     ACS_PCT_UNINSURED, data = test_counties_inner)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9650 -1.1821 -0.0536  1.1685  7.2007 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -0.108139   0.254901  -0.424    0.671    
## wm_FARA               -0.020168   0.001378 -14.636   <2e-16 ***
## CHR_PCT_ADULT_OBESITY  0.351733   0.007152  49.182   <2e-16 ***
## ACS_PCT_UNINSURED      0.094156   0.006114  15.400   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.789 on 3103 degrees of freedom
## Multiple R-squared:  0.5217, Adjusted R-squared:  0.5213 
## F-statistic:  1128 on 3 and 3103 DF,  p-value: < 2.2e-16

In this linear model, I adjusted for obesity and median household income

lm_3<- lm(CHR_PCT_DIABETES ~ wm_FARA + CHR_PCT_ADULT_OBESITY + ACS_MEDIAN_HH_INC, data = test_counties_inner)
summary(lm_3)
## 
## Call:
## lm(formula = CHR_PCT_DIABETES ~ wm_FARA + CHR_PCT_ADULT_OBESITY + 
##     ACS_MEDIAN_HH_INC, data = test_counties_inner)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1975 -1.1735 -0.0481  1.1394  5.7481 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            6.079e+00  3.253e-01   18.68   <2e-16 ***
## wm_FARA               -1.510e-02  1.341e-03  -11.25   <2e-16 ***
## CHR_PCT_ADULT_OBESITY  2.883e-01  7.499e-03   38.44   <2e-16 ***
## ACS_MEDIAN_HH_INC     -6.627e-05  2.866e-06  -23.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.715 on 3102 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.5608, Adjusted R-squared:  0.5604 
## F-statistic:  1320 on 3 and 3102 DF,  p-value: < 2.2e-16

In these adjusted, multivariable linear models, the small, negative, and statistically significant relationship between wm_FARA and CHR_PCT_DIABETES remained. The other variables assessed also had small but significant relationships with CHR_PCT_DIABETES.

5 Limitations

This project had several limitations.

There was not an evident way to capture purely type 2 diabetes data from available surveys. Interestingly, the CHR_PCT_DIABETES variable counts any diagnosis of diabetes, whether type 1, type 2, or other. In this project, the diabetes of interest is type 2 diabetes given that type 2 diabetes is associated with modifiable lifestyle factors. Still, given that 90-95% of all U.S. diabetes cases are type 2, this project is still capable of providing a reasonably strong estimate of any association between low access and type 2 diabetes prevalence.(9)

Given limitations in dataset availability, I was only able to do this analysis for the most recent year that both low access and diabetes data were available, 2015; ideally I would have had more recent data and been able to inspect changes over several years.

The exclusion of non-continental United States counties and territories should be noted, particularly in considering the summary statistics. Given that these areas external to the continental United States were not analyzed, it is possible that these reported statistics could be significantly different if one were to use data that included the United States (including territories) as a whole.

On a similar note, 1625 tracts were excluded from the analysis due to missing FARA_PCT_LA_HALF_10_POP data. It is unclear why these tracts had missing data. It is also unknown whether, had data been available, these tracts would have had a distribution of low access values similar to counties for which data was available (with similar mean, median) or perhaps have skewed toward lower or higher food access.

While I have used a linear regression model for this descriptive analysis of two continuous variables, in reality, a linear regression model is not optimal for outcomes measured as percentages, as these are bounded by 0 at the low end and 100 at the high end. It is possible that a generalized linear model may be a better fit for this type of data but more investigation would be required to confirm this as the most appropriate model. This project did not attempt to perform an exhaustive analysis of all the potentially confounding variables. For example, average or median age by county might have been a potential confounder to the relationship between wm_FARA percentage and diabetes percentage, but this particular variable was not evaluated.

6 Conclusion

Every one unit increase in the weighted mean low food access percentage was associated with a very slight but statistically significant decrease (0.01 to 0.03) in the diabetes prevalence percentage rate in both the unadjusted and adjusted linear models. Of note, the low adjusted R-squared value (ranges from 0 to 1 with 1 being ideal) of 0.09339 from the unadjusted model tells us that very little of the variance in diabetes prevalence percentage can be explained by changes in wm_FARA.

While statistical significance for this relationship was achieved, the very small magnitude of the wm_FARA coefficient likely renders this finding clinically insignificant; that is, there is effectively no meaningful relationship between wm_FARA percentage and the percentage prevalence of diabetes. In effect, these results serve to reject my initial hypothesis which was that there might be a positive association between the percentage of the population with low food access and the prevalence of diabetes.

In multivariable linear models, obesity, uninsured status, and median household income also had very small but statistically significant associations with the prevalence of diabetes, positive for obesity and negative for uninsured status and median household income.

Still, while perhaps there is no meaningful relationship between worsening food access and diabetes prevalence in continental United States counties in 2015 as identified in this project, improving food security in the United States as a general matter remains no less important for a host of other reasons. Indeed, professional societies, including the American College of Physicians, have recently recognized the multiple health and societal consequences of food insecurity and called on governments to do more to address this pressing and widespread problem.(10)

7 References

  1. Healthy People 2030. United States Department of Health and Human Services. Available at: https://health.gov/healthypeople/priority-areas/social-determinants-health. Accessed 12/6/22.

  2. The cost of diabetes. American Diabetes Association. Available at: https://diabetes.org/about-us/statistics/cost-diabetes#:~:text=People%20with%20diagnosed%20diabetes%20incur,in%20the%20absence%20of%20diabetes. Accessed 12/6/22.

  3. Tuomilehto J, Lindström J, Eriksson JG, et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N Engl J Med. 2001;344(18):1343-1350. doi:10.1056/NEJM200105033441801

  4. Tello, M. Healthy lifestyle can prevent diabetes (and even reverse it). Harvard Health Blog. 2018; September 6. Available at: https://www.health.harvard.edu/blog/healthy-lifestyle-can-prevent-diabetes-and-even-reverse-it-2018090514698

  5. Food Security in the U.S.: Measurement. Economic Research Service, United States Department of Agriculture. https://www.ers.usda.gov/topics/food-nutrition-assistance/food-security-in-the-u-s/measurement/#insecurity. Accessed 12/6/22.

  6. Mapping Food Deserts in the United States. Economic Research Service, United States Department of Agriculture. https://www.ers.usda.gov/amber-waves/2011/december/data-feature-mapping-food-deserts-in-the-us/. Accessed 12/6/22.

  7. Odegaard AO, Koh WP, Yuan JM, Gross MD, Pereira MA. Western-style fast food intake and cardiometabolic risk in an Eastern country. Circulation. 2012;126(2):182-188. doi:10.1161/CIRCULATIONAHA.111.084004

  8. State Health Facts: Health Insurance Coverage of the Total Population. Kaiser Family Foundation. Available at: https://www.kff.org/other/state-indicator/total-population/?currentTimeframe=5&selectedDistributions=medicaid--medicare&sortModel=%7B%22colId%22:%22Location%22,%22sort%22:%22asc%22%7D. Accessed 12/6/22.

  9. Type 2 Diabetes. Centers for Disease Control and Prevention. Available at: https://www.cdc.gov/diabetes/basics/type2.html#:~:text=Healthy%20eating%20is%20your%20recipe,adults%20are%20also%20developing%20it. Accessed 12/8/22.

  10. Serchen J, Atiq O, Hilden D; Health and Public Policy Committee of the American College of Physicians. Strengthening Food and Nutrition Security to Promote Public Health in the United States: A Position Paper From the American College of Physicians. Ann Intern Med. 2022;175(8):1170-1171. doi:10.7326/M22-0390